Lennie Wells

mentions 2 type Person feed RSS

// recent coverage 2 mentions

15:29

2026-07-09

lesswrong.com

ai-safety

Debate with Self-Play Best-of-N Optimization

Researchers at an undisclosed lab introduced a best-of-N (BoN) optimization method as a proxy for self-play training in debate protocols, aiming to improve scalable oversight for AI systems. Their exp…

02:31

2026-05-29

lesswrong.com

ai-safety

Suggestions for improving debate protocols in AI safety

Researchers reviewing AI safety debate protocols found that current "propose-critique-decide" models are vulnerable to gaming, where critic models exploit a "last mover advantage" by withholding key c…

// co-occurs with top 6 entities

MATS Research 1 joanv 1 Andrew Draganov 1 Daniel Tan 1 Joan Velja 1 LiveCodeBench 1